********************************************************************************
***************** Creating Gabapentinoid code lists ****************************
************************* Chapter 3 Example 1 **********************************

*     From the hypothetical eRAP, we want Oral Gabapentinoids.
*     Gabapentinoids are separated into Gabapentin and Pregabalin, and are sold under:
*     Gabapentin: 'Neurontin'
*     Pregabailin: 'Lyrica', 'Alzain', 'Axalid', 'Lecaent' and 'Rewisca'
*     Therefore, these are our inclusion search terms. Also note that we only want Oral Gabepentinoids, so we exclude those not oral.
*     Since these are drugs, we use the Product Aurum file.




***************** Step 1: Set your working directory ***************************

*     This is the file path you want to work in, replace the filepath underneath with your filepath to the Aurum Code Browser file
cd "My file path:\...\CPRD_CodeBrowser_20211_Aurum\CPRD_CodeBrowser_202211_Aurum" 

*     Using log, this command allows stata to record the stata code and outputs. Similarly, replace the filepath with your filepath to the Logs folder in the Aurum Code Browser file. I named this log 'nameoflog' for ease of pasting the code, however including the codelist name is beneficial. E.g. naming this log 'Gabapentinoid_codelist_log'
log using "My file path:\...\CPRD_CodeBrowser_202211_Aurum\CPRD_CodeBrowser_202211_Aurum\Logs\nameoflog"




***************** Step 2: Importing Product file *******************************

*     We must import long numbers as string so stata is able to read them. I usually import all variables as string and then change variable types afterwards.
*     This ensures that the long numbers that CPRD provide remain fully intact.
import delimited using "CPRDAurumProduct.txt", stringcols(_all) bindquote(nobind) 



***************** Step 3: Setting terms to lowercase ***************************
*     Searching is case sensitive, therefore we create new variable's names without the restriction of capital letters (converting all letters to lowercase)
gen emisterm_lower = ustrlower(termfromemis)
gen productname_lower = ustrlower(productname)
gen drugname_lower = ustrlower(drugsubstancename)
gen formulation_lower = ustrlower(formulation)





***************** Step 4: Searching for terms **********************************
*     Create variable that indicates whether the term is a possible gabapentinoid
gen poss_gabapentinoid = .

*     Term = Gabapentin

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower, "*gabapentin*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*gabapentin*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*gabapentin*")

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower, "*neurontin*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*neurontin*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*neurontin*")

*     Term = Pregabalin

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower, "*pregabalin*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*pregabalin*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*pregabalin*")

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower , "*alzain*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*alzain*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*alzain*")

replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*axalid*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*axalid*")
replace poss_gabapentinoid = 1 if strmatch( emisterm_lower , "*axalid*")

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower , "*lyrica*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*lyrica*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*lyrica*")

replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*rewisca*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*rewisca*")
replace poss_gabapentinoid = 1 if strmatch( emisterm_lower , "*rewisca*")

replace poss_gabapentinoid = 1 if strmatch( emisterm_lower , "*lecaent*")
replace poss_gabapentinoid = 1 if strmatch( productname_lower , "*lecaent*")
replace poss_gabapentinoid = 1 if strmatch( drugname_lower , "*lecaent*")


*     Quick check that all terms are correctly indicated
browse if poss_gabapentinoid==1

** Alternatively:
foreach i in "*gabapentin*" "*neurontin*" "*pregabalin*" "*alzain*" "*axalid*" "*lyrica*" "*rewisca*" "*lecaent*" {
	replace poss_gabapentinoid = 1 if strmatch(emisterm_lower, "`i'") == 1 | strmatch(productname_lower, "`i'") == 1 | strmatch(drugname_lower, "`i'") == 1
}




***************** Step 5: Keep wanted codes and export *************************
*     We only want oral gabapentinoids
keep if poss_gabapentinoid==1
tab formulation
drop if formulation_lower == "gel"

*     Tidying the data for exporting and saving
drop emisterm_lower productname_lower drugname_lower poss_gabapentinoid formulation_lower

*     Saving the stata file in the data file within codelists
save "My file path:\...\CPRD_CodeBrowser_202211_Aurum\CPRD_CodeBrowser_202211_Aurum\Codelists\data\gabapentinoids.dta"

*     Exporting the stata file to excel
export excel using "My file path:\...\CPRD_CodeBrowser_20211_Aurum\CPRD_CodeBrowser_202211_Aurum\Codelists\gabapentinoid.xlsx", firstrow(variables)






*     Close log
log close
